Interval Statistic

Interval statistic is library to calculate interval estimations of an average and a variance.

There are several algorithms to test the distribution:

  • chi-square goodness-fit test

    • with k for large n
    • with Sturges' k
    • with Doane's k
    • with Wichard's k
    • with Scott's h
    • Taylor's h
    • Freedman-Diaconis' h

Load Libraries


In [2]:
using IntervalStatistic
using Distributions
using ValidatedNumerics
using Plots
pyplot(reuse=true)
srand(10)


Out[2]:
MersenneTwister(Base.dSFMT.DSFMT_state(Int32[1007524736,1073256705,415953332,1072893275,-601364280,1073193666,-1335760268,1072926448,1521827180,1073499520  …  -439825479,1072978026,-411693740,1073111955,-1611334130,1963385220,236575170,-789052601,382,0]),[0.0,6.94301e-310,6.94301e-310,6.94301e-310,4.94066e-324,6.94301e-310,NaN,0.0,0.0,6.94301e-310  …  0.0,6.94301e-310,6.94301e-310,6.94301e-310,2.16444e-312,6.94301e-310,6.94301e-310,6.94301e-310,6.94301e-310,6.94301e-310],382,UInt32[0x0000000a])

In [3]:
function show_result(value_check_label)
    values, check, label = value_check_label
    isDistr = IntervalStatistic.isDistribution(values, check)
    println(label, ": ", isDistr)
    hist = IntervalStatistic.Check.histogram(values, check)
    
    intervals = [i[1] for i in hist]
    println(label, " bin count: ", size(intervals, 1))
    midles, weights = Real[mid(i) for i in intervals], Real[i[2] for i in hist]
    all_count = sum(weights)
    plot!(x -> midles[round(Int, x)], x -> begin
        i = round(Int,x)
        weights[i]/diam(intervals[i])/all_count
        end,
        1:size(midles, 1), 
        label=label
    )
end


Out[3]:
show_result (generic function with 1 method)

Generate samples of normal distribution


In [4]:
d = Normal()
length = 500
confidence_probability = 0.95
values = rand(d, length)
mu, sigma = params(d)
average = reduce(+, values) / length


Out[4]:
0.020139963008043885

In [5]:
result_by_sturges_chi_square = (
    values,
    IntervalStatistic.Check.SturgesChiSquareCheck(0.05, Normal(mu, sigma)),
    "Chi-square with Sturges k formula"
)

result_by_scott_chi_square = (
    values,
    IntervalStatistic.Check.ScottChiSquareCheck(0.05, Normal(mu, sigma)),
    "Chi-square with Scott h formula"
)

result_by_taylor_chi_square = (
    values,
    IntervalStatistic.Check.TaylorChiSquareCheck(0.05, Normal(mu, sigma)),
    "Chi-square with Taylor h formula"
)

result_by_freedmandiaconis_chi_square = (
    values,
    IntervalStatistic.Check.FreedmanDiaconisChiSquareCheck(0.05, Normal(mu, sigma)),
    "Chi-square with Freedman-Diaconis h formula"
)

result_by_doane_chi_square = (
    values,
    IntervalStatistic.Check.DoaneChiSquareCheck(0.05, Normal(mu, sigma)),
    "Chi-square with Doane k formula"
)

result_by_wichard_chi_square = (
    values,
    IntervalStatistic.Check.WichardChiSquareCheck(0.05, Normal(mu, sigma)),
    "Chi-square with Wichard k formula"
)

result_by_large_n_chi_square = (
    values,
    IntervalStatistic.Check.LargeNChiSquareCheck(0.05, Normal(mu, sigma)),
    "Chi-square with k formula for large n"
)

result_by_modified_chi_square_for_normal_dist_10_bins = (
    values,
    IntervalStatistic.Check.ChiSquareNormalCheck(0.05, 10, mu, sigma),
    "Modified chi-square for normal dist with 10 bins"
)

result_by_modified_chi_square_for_normal_dist_11_bins = (
    values,
    IntervalStatistic.Check.ChiSquareNormalCheck(0.05, 11, mu, sigma),
    "Modified chi-square for normal dist with 11 bins"
)
plot((mu - 3*sigma):(sigma*0.01):(mu + 3*sigma), (x) -> pdf(d, x), label="pdf")

show_result(result_by_sturges_chi_square)
show_result(result_by_large_n_chi_square)
show_result(result_by_wichard_chi_square)
show_result(result_by_doane_chi_square)
show_result(result_by_freedmandiaconis_chi_square)
show_result(result_by_scott_chi_square)
show_result(result_by_taylor_chi_square)
show_result(result_by_modified_chi_square_for_normal_dist_10_bins)
show_result(result_by_modified_chi_square_for_normal_dist_11_bins)


[Plots.jl] Initializing backend: pyplot
Chi-square with Sturges k formula: false
Chi-square with Sturges k formula bin count: 6
Chi-square with k formula for large n: false
Chi-square with k formula for large n bin count: 28
Chi-square with Wichard k formula: false
Chi-square with Wichard k formula bin count: 6
Chi-square with Doane k formula: false
Chi-square with Doane k formula bin count: 6
Chi-square with Freedman-Diaconis h formula: false
Chi-square with Freedman-Diaconis h formula bin count: 12
Chi-square with Scott h formula: false
Chi-square with Scott h formula bin count: 12
Chi-square with Taylor h formula: false
Chi-square with Taylor h formula bin count: 12
Modified chi-square for normal dist with 10 bins: false
Modified chi-square for normal dist with 10 bins bin count: 10
Modified chi-square for normal dist with 11 bins: true
Modified chi-square for normal dist with 11 bins bin count: 11
Out[5]:

Generate samples of normal distribution with mu=100 sigma=4


In [6]:
d = Normal(100, 4)
length = 500
confidence_probability = 0.95
values = rand(d, length)
mu, sigma = params(d)
average = reduce(+, values) / length


Out[6]:
99.96458979418557

In [7]:
result_by_sturges_chi_square = (
    values,
    IntervalStatistic.Check.SturgesChiSquareCheck(0.05, Normal(mu, sigma)),
    "Chi-square with Sturges k formula"
)

result_by_scott_chi_square = (
    values,
    IntervalStatistic.Check.ScottChiSquareCheck(0.05, Normal(mu, sigma)),
    "Chi-square with Scott h formula"
)

result_by_taylor_chi_square = (
    values,
    IntervalStatistic.Check.TaylorChiSquareCheck(0.05, Normal(mu, sigma)),
    "Chi-square with Taylor h formula"
)

result_by_freedmandiaconis_chi_square = (
    values,
    IntervalStatistic.Check.FreedmanDiaconisChiSquareCheck(0.05, Normal(mu, sigma)),
    "Chi-square with Freedman-Diaconis h formula"
)

result_by_doane_chi_square = (
    values,
    IntervalStatistic.Check.DoaneChiSquareCheck(0.05, Normal(mu, sigma)),
    "Chi-square with Doane k formula"
)

result_by_wichard_chi_square = (
    values,
    IntervalStatistic.Check.WichardChiSquareCheck(0.05, Normal(mu, sigma)),
    "Chi-square with Wichard k formula"
)

result_by_large_n_chi_square = (
    values,
    IntervalStatistic.Check.LargeNChiSquareCheck(0.05, Normal(mu, sigma)),
    "Chi-square with k formula for large n"
)

result_by_modified_chi_square_for_normal_dist_10_bins = (
    values,
    IntervalStatistic.Check.ChiSquareNormalCheck(0.05, 10, mu, sigma),
    "Modified chi-square for normal dist with 10 bins"
)

result_by_modified_chi_square_for_normal_dist_11_bins = (
    values,
    IntervalStatistic.Check.ChiSquareNormalCheck(0.05, 11, mu, sigma),
    "Modified chi-square for normal dist with 11 bins"
)
plot((mu - 3*sigma):(sigma*0.01):(mu + 3*sigma), (x) -> pdf(d, x), label="pdf")

show_result(result_by_sturges_chi_square)
show_result(result_by_large_n_chi_square)
show_result(result_by_wichard_chi_square)
show_result(result_by_doane_chi_square)
show_result(result_by_freedmandiaconis_chi_square)
show_result(result_by_scott_chi_square)
show_result(result_by_taylor_chi_square)
show_result(result_by_modified_chi_square_for_normal_dist_10_bins)
show_result(result_by_modified_chi_square_for_normal_dist_11_bins)


Chi-square with Sturges k formula: false
Chi-square with Sturges k formula bin count: 6
Chi-square with k formula for large n: false
Chi-square with k formula for large n bin count: 26
Chi-square with Wichard k formula: false
Chi-square with Wichard k formula bin count: 6
Chi-square with Doane k formula: false
Chi-square with Doane k formula bin count: 6
Chi-square with Freedman-Diaconis h formula: false
Chi-square with Freedman-Diaconis h formula bin count: 13
Chi-square with Scott h formula: false
Chi-square with Scott h formula bin count: 13
Chi-square with Taylor h formula: false
Chi-square with Taylor h formula bin count: 13
Modified chi-square for normal dist with 10 bins: true
Modified chi-square for normal dist with 10 bins bin count: 10
Modified chi-square for normal dist with 11 bins: true
Modified chi-square for normal dist with 11 bins bin count: 11
Out[7]: